Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility modules for lists and maps #18

Merged
merged 7 commits into from
Sep 11, 2024

Conversation

sungshik
Copy link
Collaborator

@sungshik sungshik commented Sep 10, 2024

This is a small refactoring PR in anticipation of the next feature PR (too many generic utility functions started to get scattered across the code base and were awkward to reuse from other modules, which we'll need next). Specifically, this PR:

  • Adds new modules MapUtil and ListUtil
  • Adds an explicit type somewhere to help the type checker (unrelated to the new modules, but ok to fix in this PR too)
  • Removes some obsolete code that's no longer used (unrelated to the new modules, but ok to fix in this PR too)

Moving code to new places aside (and rewriting a few comments), none of the moved code has been changed.

Comment on lines -82 to -94
@synopsis{
Adds a TextMate rule to both the repository and the patterns of TextMate
grammar `g`
}

TmGrammar addRule(TmGrammar g, TmRule r)
= g [repository = g.repository + (r.name: r)]
[patterns = appendIfAbsent(g.patterns, include("#<r.name>"))];

// TODO: This function could be moved to a separate, generic module
private list[&T] appendIfAbsent(list[&T] vs, &T v)
= v in vs ? vs : vs + v;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has become unused and is just removed

@sungshik sungshik marked this pull request as ready for review September 10, 2024 08:22
}

bool isStrictPrefix(list[&T] l1, list[&T] l2)
= size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps, in the challenge:

bool isStrictPrefix(list[&T] l1, list[&T] l2)
    = size(l1) < size(l2) && l1 == l2[..size(l2);

is even shorter ;)

Or even this ;)

bool isStrictPrefix(list[&T] l1:[_,*_], [*l1, _, *_]) = true;
default bool isStrictPrefix(_, _) = false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's brilliant, I concede 😉

Copy link
Collaborator Author

@sungshik sungshik Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a bit interested about performance of the various implementations (not really relevant for the current use case; just out of curiosity) and took a few anecdotal measurements. For posterity, here it goes...

Implementations

bool isStrictPrefix1([], [])
    = false;
bool isStrictPrefix1([], [_, *_])
    = true;
bool isStrictPrefix1([_, *_], [])
    = false;
bool isStrictPrefix1([head1, *tail1], [head2, *tail2])
    = head1 == head2 && isStrictPrefix1(tail1, tail2);

// -------------------------------------------------------------------------- //

bool isStrictPrefix2(list[&T] l1, list[&T] l2) {
  if (size(l1) >= size(l2)) {
     return false;
  }
  if (i <- [0..size(l1)], l1[i] != l2[i]) {
     return false;
  }
  return true;
}

// -------------------------------------------------------------------------- //

bool isStrictPrefix3(list[&T] l1, list[&T] l2)
    = size(l1) < size(l2) && !any(i <- [0..size(l1)], l1[i] != l2[i]);

// -------------------------------------------------------------------------- //

bool isStrictPrefix4(list[&T] l1, list[&T] l2)
    = size(l1) < size(l2) && l1 == l2[..size(l1)];

// -------------------------------------------------------------------------- //

bool isStrictPrefix5(list[&T] l1:[_,*_], [*l1, _, *_]) = true;
default bool isStrictPrefix5(_, _) = false;

Measurements

import IO;
import util::Benchmark;

void main(int i = 0) {
    cases = (
        "v1": void() { for (_ <- [0..1000]) { isStrictPrefix1([0..i], [0..1000]); } },
        "v2": void() { for (_ <- [0..1000]) { isStrictPrefix2([0..i], [0..1000]); } },
        "v3": void() { for (_ <- [0..1000]) { isStrictPrefix3([0..i], [0..1000]); } },
        "v4": void() { for (_ <- [0..1000]) { isStrictPrefix4([0..i], [0..1000]); } },
        "v5": void() { for (_ <- [0..1000]) { isStrictPrefix5([0..i], [0..1000]); } }
    );
    println(benchmark(cases));
}

Quiz

Question: Which implementation is slowest?

Answer...

isStrictPrefix1 is always slowest. For small prefixes, all other implementations are comparably fast. For larger prefixes, isStrictPrefix2 gets increasingly slower than ...3/4/5 (but never as slow as ...1).

Question: Which implementation is fastest?

Answer...

isStrictPrefix3/4/5 are comparably fast for small prefixes and non-prefixes. For larger prefixes, ...4/5 are clearly fastest (but there is no clear winner between the two).

Question: Which implementation crashes, if any?

Answer...

isStrictPrefix1 crashes with a java.lang.StackOverflowError for a list of size 1000 and a prefix of size 999 (not necessarily a minimal example).

Results

First series

With isStrictPrefix1:

rascal>main(i = 9)
("v1":404,"v2":304,"v3":322,"v4":312,"v5":306)
ok
rascal>main(i = 99)
("v1":1461,"v2":663,"v3":413,"v4":288,"v5":320)
ok
rascal>main(i = 999)
java.lang.StackOverflowError
Second series

Without isStrictPrefix1:

rascal>main(i = 9)
("v2":319,"v3":387,"v4":344,"v5":374)
ok
rascal>main(i = 99)
("v2":594,"v3":450,"v4":332,"v5":421)
ok
rascal>main(i = 999)
("v2":18232,"v3":1713,"v4":577,"v5":520)
ok
rascal>main(i = 1000)
("v2":524,"v3":526,"v4":517,"v5":513)
ok
rascal>main(i = 9999)
("v2":3021,"v3":3003,"v4":3098,"v5":2975)
ok

Copy link
Member

@DavyLandman DavyLandman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro-benchmarks are hard ;) these are doing to much in the loop that is not part of the focus.

On my machine, I get these numbers:

rascal>main_old();
Reloading module Bench
("v2":213,"v3":171,"v4":176,"v5":218)

and after rewriting the benchmark:

void main(int i = 0) {
    full = [0..1000];
    needle = [0..i];
    rounds = [0..1000];
    cases = (
        "v2": void() { for (_ <- rounds) { isStrictPrefix2(needle, full); } },
        "v3": void() { for (_ <- rounds) { isStrictPrefix3(needle, full); } },
        "v4": void() { for (_ <- rounds) { isStrictPrefix4(needle, full); } },
        "v5": void() { for (_ <- rounds) { isStrictPrefix5(needle, full); } }
    );
    println(benchmark(cases));
}

it prints:

Reloading module Bench
("v2":71,"v3":47,"v4":37,"v5":24)

suddenly v5 is the fastest, and the numbers are all quite a bit lower, let's increase the rounds to 100000.

rascal>main()
("v2":908,"v3":852,"v4":839,"v5":187)

so yeah, v5 might be a quite a bit faster than the rest of them ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that's interesting! 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I would have kept V4 for readability ;)

@sungshik sungshik merged commit 899fb3d into main Sep 11, 2024
2 checks passed
@sungshik sungshik deleted the utility-modules-for-lists-and-maps branch September 11, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants